dot product in ML

When you multiply a matrix A of size n times d with a matrix B of size d times n the resulting...

dot product in ML

Tiwari
February 05, 2024

Dot Product:🔗

When you multiply a matrix AA of size n×dn \times d with a matrix BB of size d×nd \times n, the resulting matrix CC is of size n×nn \times n. Each entry CijC_{ij} of this matrix is the dot product of the ii-th row of matrix AA with the jj-th column of matrix BB.

For matrices:

A=[a11a12a1da21a22a2dan1an2and]A = \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1d} \\ a_{21} & a_{22} & \dots & a_{2d} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nd} \end{bmatrix}

and

B=[b11b12b1nb21b22b2nbd1bd2bdn]B = \begin{bmatrix} b_{11} & b_{12} & \dots & b_{1n} \\ b_{21} & b_{22} & \dots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{d1} & b_{d2} & \dots & b_{dn} \end{bmatrix}

The resulting matrix CC is:

C=[c11c12c1nc21c22c2ncn1cn2cnn]C = \begin{bmatrix} c_{11} & c_{12} & \dots & c_{1n} \\ c_{21} & c_{22} & \dots & c_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c_{n1} & c_{n2} & \dots & c_{nn} \end{bmatrix}

Where each cijc_{ij} is:

cij=ai1b1j+ai2b2j++aidbdjc_{ij} = a_{i1}b_{1j} + a_{i2}b_{2j} + \dots + a_{id}b_{dj}

This is the dot product of the ii-th row of AA with the jj-th column of BB.

The dot product, in this context, measures how "similar" two vectors are. If two vectors are orthogonal (i.e., perpendicular to each other), their dot product is zero. If two vectors are in the same direction, their dot product is positive. If they are in opposite directions, the dot product is negative.

Thus, the resulting n×nn \times n matrix CC essentially captures the similarity between the rows of matrix AA and the columns of matrix BB. If matrix BB is the transpose of matrix AA, then the resulting matrix CC captures the pairwise similarity between the rows of matrix AA with itself.


Application of dot product


Similarity measures:🔗

Similarity measures, especially the dot product, play a foundational role in many machine learning and NLP (Natural Language Processing) applications.

  1. Cosine Similarity in Information Retrieval & Document Similarity:

    • Each document can be represented as a vector where each dimension corresponds to a term (word) from a vocabulary, and the value can be the term frequency or TF-IDF (term frequency-inverse document frequency) score.
    • The cosine similarity between two document vectors gives a measure of how similar the two documents are content-wise. It's called cosine similarity because it's the cosine of the angle between two vectors. If the vectors are orthogonal (angle = 90°), the cosine similarity is 0; if they point in the same direction (angle = 0°), it's 1.
  2. Word Embeddings:

    • In modern NLP, words are often represented as dense vectors (word embeddings) in a continuous space, e.g., Word2Vec, GloVe, and embeddings from models like BERT.
    • The similarity between two word embeddings can be measured using the dot product or cosine similarity. This similarity can give semantic closeness between words. For example, "king" and "queen" would be closer in this space than "king" and "apple".
  3. Neural Network Activations:

    • In neural networks, especially deep learning models, the dot product is used extensively in fully connected layers, convolutional layers, etc.
    • The weights in a neural network can be thought of as learning to recognize certain patterns or features. The dot product between an input vector and a weight vector can measure how much of the feature represented by the weights is present in the input.
  4. Attention Mechanisms:

    • In transformer-based models like BERT, GPT, etc., attention mechanisms are used to weigh the importance of different parts of an input sequence when producing an output.
    • Dot products are used in these attention calculations to measure the relevance of different parts of the input to the current computation.
  5. Recommendation Systems:

    • User and item interactions can be represented in a matrix. Matrix factorization techniques, like Singular Value Decomposition (SVD), can decompose this matrix into user and item embeddings.
    • The dot product between a user and an item embedding can predict the user's preference for that item.
  6. Clustering & Dimensionality Reduction:

    • Similarity measures can be used in clustering algorithms like K-means to group similar data points together.
    • Techniques like t-SNE use pairwise similarities to reduce the dimensionality of data while preserving local structures.
  7. Semantic Search:

    • Given a query, instead of just searching for documents with exact matching words, you can search for documents whose embeddings are semantically close to the query embedding.
  8. Analogical Reasoning:

    • With word embeddings, analogies like "man" is to "woman" as "king" is to "queen" can be solved by vector arithmetic. This relies on the cosine similarity between word vectors.

The dot product can be thought of in two primary ways:🔗

  1. Similarity:

    • When both vectors being dotted are normalized (i.e., have a magnitude of 1), the dot product gives the cosine of the angle between them. This is known as the cosine similarity.
    • If the dot product is 1, the vectors are identical (angle of 0°).
    • If the dot product is 0, the vectors are orthogonal or perpendicular (angle of 90°), meaning they share no similarity.
    • If the dot product is -1, they are diametrically opposed (angle of 180°).
  2. Weighted Sum:

    • The dot product can be seen as a weighted sum when one vector represents weights and the other represents values. In this interpretation, the dot product gives a single value that is the sum of products of corresponding entries of the two sequences of numbers.
    • This interpretation is prevalent in neural networks where input values are weighted by learned weights.

Both interpretations are valid, and the context in which you're working determines which interpretation is more appropriate. For example, in machine learning and specifically in neural networks, the weighted sum interpretation is often more relevant, whereas in vector space models in NLP or information retrieval, the similarity interpretation is more common.

COMING SOON ! ! !

Till Then, you can Subscribe to Us.

Get the latest updates, exclusive content and special offers delivered directly to your mailbox. Subscribe now!

ClassFlame – Where Learning Meets Conversation! offers conversational-style books in Computer Science, Mathematics, AI, and ML, making complex subjects accessible and engaging through interactive learning and expertly curated content.


© 2024 ClassFlame. All rights reserved.